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Abstract. Straight-line (linear) context-free tree (SLT) grammars have been used 
to compactly represent ordered trees. It is well known that equivalence of SLT 
grammars is decidable in polynomial time. Here we extend this result and show 
that isomorphism of unordered trees given as SLT grammars is decidable in poly¬ 
nomial time. The proof constructs a compressed version of the canonical form 
of the tree represented by the input SLT grammar. The result is generalized to 
unrooted trees by “re-rooting” the compressed trees in polynomial time. We fur¬ 
ther show that bisimulation equivalence of unrooted unordered trees represented 
by SLT grammars is decidable in polynomial time. Lor non-linear SLT grammars 
which can have double-exponential compression ratios, we prove that unordered 
isomorphism is PSPACE-hard and in EXPTIME. The same complexity bounds are 
shown for bisimulation equivalence. 


1 Introduction 

Deciding isomorphism between various mathematical objects is an important topic in 
theoretical computer science that has led to intriguing open problems like the pre¬ 
cise complexity of the graph isomorphism problem. An example of an isomorphism 
problem, where the knowledge seems to be rather complete, is tree isomorphism. Aho, 
Hopcroft and Ullman (TJ page 84] proved that isomorphism of unordered trees (rooted 
or unrooted) can be decided in linear time. An unordered tree is a tree, where the chil¬ 
dren of a node are not ordered. The precise complexity of tree isomorphism was fi¬ 
nally settled by Lindell fT3l . Buss J5j, and Jenner et al. fill : Tree isomorphism is 
logspace-complete if the trees are represented by pointer structures ram and ALOG- 
TlME-complete if the trees are represented by expressions sm . All these results deal 
with trees that are given explicity (either by an expression or a pointer structure). In 
this paper, we deal with the isomorphism problem for trees that are given in a succinct 
way. Several succinct encoding schemes for graphs exist in the literature. Galperin and 
Wigderson jS] considered graphs that are given by a boolean circuit for the adjacency 
matrix. Subsequent work showed that the complexity of a problem undergoes an expo¬ 
nential jump when going from the standard input representation to the circuit represen¬ 
tation; this phenomenon is known as upgrading, see 0 for more details and references. 
Concerning graph isomorphism, it was shown in 0 that its succinct version is PSPACE- 
hard, even for very restricted classes of boolean circuits (DNFs and CNFs). 

In this paper, we consider another succinct input representation that has turned out 
to be more amenable to efficient algorithms, and, in particular, does not show the up¬ 
grading phenomenon known for boolean circuits: straight-line context-free grammars, 






i.e., context-free grammars that produce a single object. Such grammars have been in¬ 
tensively studied for strings and recently also for trees. Using a straight-line grammar, 
repeated patterns in an string or tree can be abbreviated by a nonterminal which can 
be used in different contexts. For strings, this idea is known as grammar-based com¬ 
pression 0H. and it was extended to trees in 141 161 1. In fact this approach can be also 
extended to general graphs by using hyperedge replacement graph grammars; the re¬ 
sulting formalism is known as hierarchical graph representation and was studied under 
an algorithmic perspective in fl2l . 

The main topic of this paper is the isomorphism problem for trees that are succinctly 
represented by straight-line context-free tree grammars. An example of such a grammar 
contains the productions S —> Ao(a), Ai(y ) —» A, +i (A i+i (y)) for 0 < i < n — 1, and 
A n {y) —> f{y , y) (here y is called a parameter and in general several parameters may 
occur in a rule). This grammar produces a full binary tree of height 2" and hence has 
2 2 +1 - 1 many nodes. This example shows that a straight-line context-free tree gram¬ 
mar may produce a tree, whose size is doubly exponential in the size of the grammar. 
The reason for this double exponential blow-up is copying: The parameter y occurs 
twice in the right-hand side of the production A n {y) —> f(y , y). If this is not allowed, 
i.e., if every parameter occurs at most once in every right-hand side, then the gram¬ 
mar is called linear. Straight-line linear (resp., non-linear) context-free tree grammars 
are called SLT grammars (resp., ST grammars) in this paper. SLT grammars generalize 
dags (directed acyclic graphs) that allow to share repeated subtrees of a tree, whereas 
SLT grammars can also share repeated patterns that are not complete subtrees. 

It turned out that many algorithmic problems are much harder for trees represented 
by ST grammars than trees represented by SLT grammars. A good example is the 
membership problem for tree automata (PTlME-complete for SLT grammars ED and 
PSPACE-complete for ST grammars ffl5l ), A similar situation arises for the isomorphism 
problem: We prove that 

- the isomorphism problem for (rooted or unrooted) unordered trees that are given 
by SLT grammars is PTlME-complete, and 

- the isomorphism problem for (rooted or unrooted) unordered trees that are given 
by ST grammars is PSPACE-hard and in EXPTIME. 

Our polynomial time algorithm for SLT grammars constructs from a given SLT gram¬ 
mar G a new SLT grammar G' that produces a canonical representation of the tree 
produced by G. Our canonical representation of a given rooted unordered tree t is the 
ordered rooted tree (in an ordered tree the children of a node are ordered) that has 
the lexicographically smallest preorder traversal among all ordered versions of t. For 
unrooted SLT-compressed trees, we first compute a compressed representation of the 
center node of a given SLT-compressed unrooted tree t. Then we compute an SLT 
grammar that produces the rooted version of t that is rooted in the center node. This 
is also the standard reduction of the unrooted isomorphism problem to the rooted iso¬ 
morphism problem in the uncompressed setting, but it requires some work to carry out 
this reduction in polynomial time in the SLT-compressed setting. 

Our techniques can be also used to show that checking bisimulation equivalence 
of trees that are represented by SLT grammars is PTlME-complete. This generalizes 
the well-known PTlME-completeness of bisimulation for dags CO- In this context, it is 
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interesting to note that bisimulation equivalence for graphs that are given by hierarchical 
graph representations is PSPACE-hard and in EXPTIME 

2 Preliminaries 

For k > 0 let [fc] = {1,..., A;}. Let E be an alphabet. By Ts we denote the set of all 
(ordered, rooted) trees over the alphabet E. It is defined recursively as the smallest set 
of strings T such that if ti,... ,tk £ T and k > 0 then also a(ti,... ,tk) is in T. For the 
tree a() we simply write a. The set D(t) of Dewey addresses of a tree t = <j{t\,... ,tk) 
is the subset of N* defined recursively as {e} U Uie[fc] *' D{tf). Thus e denotes the root 
node of t and u ■ i denotes the i-th child of u. For u £ D(t), we denote by t[u\ £ E the 
symbol at it, i.e., if t = c(ti ,..., tk), then f[e] = a and t[i ■ it] = L[it]. The size of the 
tree t is \t\ = |D(f)|. 

A ranked alphabet TV is a finite set of symbols each of which equipped with a non¬ 
negative integer, called its “rank”. We write A (k> to denote that the rank of A is k, and 
write TV ^ for the set of symbols in TV that have rank k. For an alphabet E and a ranked 
alphabet TV, we denote by Tjv us the set of trees t over TV U E with the property that if 
t [it] = A £ N^ k \ then u ■ i £ D(t) if and only if i £ [A;]. Thus, if a node is labeled by 
a ranked symbol, then the rank determines the number of children of the node. 

We fix a special alphabet Y = {yi,y 2 ,... } of parameters. For y\ we also write y. 
The parameters are considered as symbols of rank zero, and by Tsun(Y) we denote 
the set of trees from "IAunuy where each symbol in Y has rank zero. We write Y/ c for 
the set of parameters {j/i,..., y &}. For trees t,ti, ■ ■ ■ ,tk £ Teun(Y) we denote by 
t[yj <— tj | j £ [A]] the tree obtained from t, by replacing in parallel every occurrence 

of Vj O' € [fc]) by tj. 

A context-free tree grammar is a tuple G = (TV, E, S , P) where TV is a ranked al¬ 
phabet of nonterminal symbols, E is an alphabet of terminal symbols with E n TV = 0, 
S £ TV^ is the start nonterminal, and P is a finite set of productions of the form 
A(yi,... ,yk) —> t where A £ N^ k \ k > 0, and t £ Tx\jz:(Yk). Occasionally, 
we consider context-free tree grammars without a start nonterminal. Two trees £, £' £ 
Tn\je{Y) are in the one-step derivation relation =>g induced by G, if £ has a subtree 
A(ti ,... ,tk) with A £ N^ k \ k > 0 such that £' is obtained from £ by replacing this 
subtree by t\yj <— tj \ j £ [fc]], where A(yi ,..., yu) —> t is a production in P. The 
tree language L(G ) produced by G is {t £ Ts \ S =>q t}. We assume that G con¬ 
tains no useless productions, i.e., each production as applied in the derivation of some 
terminal tree in Ts. The size of the grammar G is |G| = ^ A ^ yi Vk )-^ t )^p V- 1- The 
grammar G = (TV, E. S, P) is deterministic if for every A £ TV there is exactly one 
production of the form A —> t. The grammar G is acyclic, if there is a linear order < on 
TV such that A < B whenever B occurs in a tree t with (A —> t ) £ P. A deterministic 
and acyclic grammar is called straight-line. Note that |L(G)| = 1 for a straight-line 
grammar. We denote the unique tree t. produced by the straight-line tree grammar G 
by val(G). Moreover, for a tree t £ Tsun(Y) we denote with val^f) £ Tjj(Y) the 
unique tree obtained from t by applying productions from G until only terminal sym¬ 
bols from E occur in the tree. If G is clear from the context, we simply write val(T) 
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for valr;(£). The grammar G is linear if for every production (A —>• t) £ P and every 
y £ Y, y occurs at most once in t. 

For a straight-line linear context-free tree grammar we say SLT grammar. For a 
(not necessarily linear) straight-line context-free tree grammar we say ST grammar. 
Most of this paper is about SLT grammars, only in Section[6]we deal with (non-linear) 
ST grammars. SLT grammars generalize rooted node-labelled dags (directed acyclic 
graph), where the tree defined by such a dag is obtained by unfolding the dag starting 
from the root (formally, the nodes of the tree are the directed paths in the dag that start 
in the root). A dag can be viewed as an SLT grammar, where all nonterminals have rank 
0 (the nodes of the dag correspond to the nonterminal of the SLT grammar). Dags are 
less succinct than SLT grammars (take the tree f N (a) for N = 2 n ), which in turn are 
less succinct than general ST grammars (take a full binary tree of height 2 n ). We need 
the following fact: 

Lemma 1. A given ST grammar G can be transformed in exponential time into an 
equivalent SLT grammar. 

Proof. In fact, an ST grammar G can be transformed in exponential time into an equiv¬ 
alent dag. This dag is obtained by viewing the right hand side t(xi ,..., x n ) of a G- 
production A(x 1 ,..., x n ) —> t{x 1 ,..., x n ) as a dag, by merging for all i £ [fc] all 
,x', -label led leafs into a single x,-labelled node. In this way, G becomes a so called hy¬ 
peredge replacement graph grammar (or hierarchical graph definition in the sense of 
lfl2l ) that produces a dag of exponential size, which can be constructed in exponential 
time from G, and whose unfolding is va 1(G). □ 

A context is a tree in Tx;ujv({j/}) with exactly one occurrence of y. We denote with 
Csun the set of all contexts and write Cv for the set of contexts that contain only sym¬ 
bols from E. For a context t(y) and a tree t! we write t[t'] for t[y ■£- £']. Occasionally, 
we also consider SLT grammars, where the start nonterminal belongs to , i.e., has 
rank 1. We call such a grammar a 1-SLT grammar. Note that val(G) is a context if G is 
a 1-SLT grammar G. 

In the literature, SLT grammars are usually defined over a ranked terminal alphabets. 
The following lemma is proved in 03; the proof immediately carries over to our setting 
where E is not ranked. 

Lemma 2. One can transform in polynomial time an SLT grammar into an equivalent 
SLT grammar, where each production has one of the following four types (where a £ E 

and A, B,C,Ai,... ,Ak £ N): 

(1) A —> o(Ai ,..., Ak), 

(2) A ->• B(C), 

(3) A(y) (j(A \,..., Ai,y, A i+1 ,.. .,A k ), or 

(4) A(y) B(C(y)). 

In particular, note that N contains only nonterminals of rank at most 1. 

In the following, we will only deal with SLT grammars G having the property from 
Lemma[2] For i £ [4], we denote with G(i) the SLT grammar (without start nontermi¬ 
nal) consisting of all productions of G of type ( i ) from Lemma[2] 
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Region Restrictions. A straight-line program (SLP) can be seen as a 1-SLT grammar 
G = (N, S, S, P) containing only productions of the form A{y) —> B(C{y)) and 
A{y) —> cr{y) with B,C £ N and a £ E. Thus, G contains ordinary rules of a context- 
free string grammar in Chomsky normal form (but written as monadic trees). Intuitively, 
if val(G) = ai(- • • a n (y) • • •) then G produces the string ai • • • a n and we also write 
val(G) = a\ ■ ■ ■ a n . For a string w = a\ ■ ■ ■ a n and two numbers l, r £ [n] with l < r 
we denote by w[l, r] the substring a;Oj + i ■ • • a r . The following result is a special case 
of 0, where it is shown that a so called composition system (an SLP extended with 
right-hand sides of the form A[Z, r] for positions l < r ) can be transformed into an 
ordinary SLP. 

Lemma 3. For a given SLP G and tw’o binary encoded numbers l,r £ [|val(G)|] with 
l < r one can compute in polynomial time an SLP G' such that val(G 7 ) = val(G)[(, r]. 


3 Isomorphism of Unrooted SLT-Represented Trees 

For a tree t we denote with uo(t) the unordered rooted version of/. It is the node-labeled 
directed graph (V, E, A) where V = D(t) is the set of nodes, 

E = {(u, u ■ i) | i £ N, u £ N*, u ■ i £ D(t)} 

is the edge relation, and A is the node-labelling function with A(u) = t[u\. For an SLT 
grammar G, we also write val uo (G) for uo(val(G)). 

In this section, we present a polynomial time algorithm for deciding uo(val(Gi)) = 
uo(val(G 2 )) for two given SLT grammars G\ and G- 2 - For this, we will first define a 
canonical representation of a given tree t, briefly canon(f), such that uo(s) and uo(t) 
are isomorphic if and only if canon(s) = canon(f). Then, we show how to produce for 
a given SLT grammar G in polynomial time an SLT grammar for canon(val(G)). 

For reasons that will become clear in a moment we have to restrict to trees t £ 7)j 
that have the following property: For all u, v £ D(t), if t [u] = t[v] then u and v have the 
same number of children (nodes with the same label have the same number of children). 
Such trees are called ranked trees. For the purpose of deciding the isomorphism problem 
for unorderd SLT-represented trees this is not a real restriction. Denote for a tree t £ r f)j 
the ranked tree ranked(f) such that D(t) = _D(ranked(f)) and for every u £ D(t) with 
t[u] = a: if u has k children in t, then ranked(f)[u] = cr^, where is a new symbol. 
Then we have: 

- uo(s) and uo(t) are isomorphic if and only if uo(ranked(s)) and uo(ranked(f)) are 
isomorphic. 

- For an SLT grammar G we construct in polynomial time the SLT grammar ra n ked (G) 
obtained from G by changing every production A —»• t into A —>■ ran ked (i), where 
ranked is extended to trees over £ and nonterminals by defining ranked(i)[u] = 
i[it] if i[it] is a nonterminal. Then we have val(ranked(G)) = ra n ked (va 1(G)). 

Hence, in the following we will only consider ranked trees, and all SLT grammars will 
produce ranked trees as well. 
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3.1 Length-Lexicographical Order and Canons 

Let us fix the alphabet E. For a tree t £ we denote by dflr(i) its depth-first left-to- 
right traversal string in E*. It is defined as 

dflr(cr(ii,.. .,t k )) = crdflr(ti) • • - dflr(t fc ) 

for every a £ E, k > 0, and ti,..., t k £ Ts. Note that for ranked trees s and t it holds 
that: dflr(s) = dflr(f) if and only if s = t. This is the reason for restricting to ranked 
trees: for unranked trees this equivalence fails. For instance, a(a(a)) and a(a, a) have 
the same depth-first left-to-right traversal string aaa. 

Let <y be an order on E\ it induces the lexicographical ordering <| ex on equal- 
length strings u,w £ E* as: u <i ex w if and only if there exist p, u\ w' £ E* and letters 
a,b £ E with a <s b such that u = pan' and w = pbw'. The length-lexicographical 
ordering <n ex on E* is defined by u <ii ex w if and only if (i) |u| < |w| or (ii) |u| = |ru| 
and u <| ex w. We extend the definition of <nex to trees s, t over E by s <n ex t if and 
only if dflr(s) <n ex dflr(f). 

Lemma 4. Let G, H he SLT grammars. It is decidable in polynomial time whether or 
not (1) val(G) <n ex val(if) and (2) whether or not val(G) = val(TT). 

Proof. Point (2) was shown in (4) by computing from G, H in polynomial time SLPs 
G',H' with val(G') = dflr(val(G)) and val(7T') = dflr(val(U)). Equivalence of 
SLPs can be decided in polynomial time; this was proved independently in Ill0ll9l20l . 
cf. Q3. 

To show (1), we compute in two single bottom-up runs the numbers m = |val(G')| 
and 77,2 = |val(iT')|. If m n .2 we are done; so assume that n = n\ = ri 2 ■ Next, we 
compute the first position for which the strings val(G') and val(TT') differ. This is done 
via binary search and polynomially many equivalence tests: We compute m = \n/ 2] 
and, using Lemma[3 construct SLPs Gi and G 2 for va i (G 7 ) [1, m] and va I (G') [m+1, 77 ], 
respectively, and SLPs Hi and H 2 for val(fT')[l, 777 ] and va\(H')[m+l, 77 ], respectively. 
We proceed with Gi and Hi if val(Gi) 7 ^ val(TTi), otherwise we proceed with G 2 
and H 2 . After c < [log(n)] many steps we obtain SLPs G c , H c representing the first 
position for which val(G') and val(fT') differ. We compute the terminal symbols g, h 
with val(G c ) = { 5 } and val(fT c ) = {h} and determine whether or not g <e h. □ 

For a tree t £ Tf we define its canon canon(f) as the smallest tree s with respect to <|] ex 
such that uo(s) is isomorphic to uo(t). Clearly, if canon(f) = t then also canon(f') = t’ 
for every subtree t! of t. Hence, in order to determine canon(f) for t = cr(f 1 ,..., t k ) 
(fj £ E, k > 0) let a = canon(fj) for i £ [k] and let c ^ <n ex ci 2 <n ex ... <n ex Ci k 
be the length-lexicographically ordered list of the canons c \...., . Then canon(f) = 

a(ci 2 ,..., Ci n ). The following lemma can be easily shown by an induction on the tree 
structure: 

Lemma 5. Let s,t £ Ts- Then uo(s) and uo(f) are isomorphic if and only i/canon(s) = 
canon(t). 
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3.2 Canonizing SLT-Represented Trees 

In the following, we denote a tree Ai(A 2 (- ■ ■ A n (t) •••)), where A 1: A 2 ,..., A n are 
unary nonterminals with A\A 2 ■ ■ ■ A n (t). 

Theorem 6. From a given SLT grammar G one can construct in polynomial time an 
SLTgrammar G' such that val(G') = canon(val(G)). 

Proof. Let G = (N,ZJ,S,P). We assume that G contains no distinct nonterminals 
Ai,A 2 £ N such that vale (Ai) = vale {A 2 ). This is justified because we can test 
val G (Ai) = val G (A 2 ) in polynomial time by Lemma[4](and replace A 2 by Ai in G in 
such a case). We will add polynomially many new nonterminals to G and change the 
productions for nonterminals from N such that for the resulting SLT grammar G' we 
have val g'{Z) = canon(val G (i?)) for every Z £ N^°\ 

Consider a nonterminal Z £ N^ (r> and let M be the set of all nonterminals in 
G that can be reached from Z. By induction, we can assume that G already satisfies 
val G (A) = canon(val G (A)) for every A £ \ {Z}. We distinguish two cases. 

Case (i). Z is of type (1) from Lemma[2] i.e., has a production Z —»• a(Ai ,..., Aif). 
Using Lemma|4]we construct an ordering i\,...,i k of [k] such that valc^A^) <iiex 
val G (A, 2 ) <nex • • • <n ex val G (j4j fe ). We obtain G' by replacing the production Z —» 
<r(Ai,.. .,A k ) by Z <j(A n ,. ..,A ik ) and get val G /(Z) = canon(val G (Z)). 

Case (ii). Z is of type (2), i.e., has a production Z —>• B{A). Let {Si,..., S m } = 
\ {Z} be an ordering such that 

val G (Si) <iiex val G (S 2 ) <ii ex • • • <iiex val G (S m ). 

Note that A is one of these Si. The sequence Si, S 2 ,..., S m partitions the set of all 
trees t in Ts into intervals Iq , I \,..., T m with 

- Zo = {t £ Tjj | t <ii ex val^(Si)}, 

- li = {t £ Tz I valij-(Si) <Mex t <Mex val ff (S i+ i)} for 1 < * < to, and 

- l m = {t £ Ts | val/r(S m ) <iiex t}. 

Consider the maximal G(4)-derivation starting from B(A), i.e., 

B(A) ^ (4) B x B 2 ---B n {A ), 

where Bi is a typ-(3) nonterminal. Clearly, the number N might be of exponential size, 
but the set {B i,..., B^} can be easily constructed. In order to construct an SLT for 
canon(val G (Z)), it remains to reorder the arguments in right-hand sides of the type- 
13) nonterminals Bi. The problem is of course that different occurences of a type-(3) 
nonterminal in the sequence B\B 2 ■ ■ ■ Bn have to be reordered in a different way. But 
we will show that the sequence B\B 2 ■ ■ ■ Bn can be split into m + 1 blocks such that 
all occurrences of a type-(3) nonterminal in one of these blocks have to be reordered in 
the same way. 

Let t k = val G (f?fcf?fc + i • ■ ■ Bn(A)) for k £ [TV] and fjv+i = val G (A). Note that 
ti = val G (Z) >|] ex val G (S m ) and that t k +1 <iiex t k for all k. For i £ [in] let fa be the 
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maximal position k < N + 1 such that t k >iiex valG(Si). Since t\ >n ex val G(S m ) >n ex 
valcfHi) this position is well defined. Also note that if A = Si, then we have ki = 
ki-i = ■ ■ ■ = ki = N + 1. For every 0 < i < m, the interval [fej+i + 1, kj] is the set of 
all positions k such that val G{tk) G %i- Here we set 1 = 0 and kg = N + 1. Clearly, 
the interval [ftj+i + 1, k,} might be empty. The positions ko,..., k rn can be computed 
in polynomial time, using binary search combined with Lemma [4] To apply the latter, 
note that for a given position k we can compute in polynomial time an SLT grammar 
for the tree t k using Lemma[3]for the SLP consisting of all type-(4) productions that are 
used to derive B 1 B 2 ■ ■ ■ Bn- 

We now factorize the string B 1 B 2 ■ ■ ■ Bn as B 1 B 2 ■ ■ ■ Bn = u m Um-i • ■ ■ Uq, 
where u m = B\ ■ ■ • B km -i and Ui = Bk i+1 ■ ■ ■ B ki ~i for 0 < i < m — 1. By Lemma[3] 
we can compute in polynomial time an SLP Gi for the string For the further consid¬ 
eration, we view Gi as a 1-SLT grammar consisting only of type-(4) productions. Note 
that vaI ( G,) is a linear tree, where every node is labelled with a type-(3) nonterminal. 
We now add reordered versions of type-(3) productions to Gi. Consider a type-(3) pro¬ 
duction (C(y) -> cr(Ai,.. .,Aj,y,A j+ i,.. .,A k )) e P where C G {Hi,.. .,B N }. 
Then we add to Gi the type-(3) production 

C{y) —> v{Aj 1 ,... ,Aj u ,y,Aj u+1 ,... ,Aj k ), 
where {ji...., j k } = [fc] and 0 < v < k are chosen such that 

(1) val G(A n ) <n ex val g(Aj 2 ) <Hex • • ■ <iiex val G(A jk ) and 

(2) val g(Aj u ) <nex val G (5i) <\ iex val G (A^ +1 ). 

Note that if v = k then condition (2) states that valc(4lj fc ) <n ex val ( 3 (S'i), and if 
v = 0 then it states that valG(Si) <iiex valc^ji). Also note that condition (2) en¬ 
sures that for every tree f £ I; we have val g(Aj„) <n e x t <n ex val G(Aj u+1 ). Hence, 
\/AG{o'(Aj 1 ,..., Aj v , t, Aj u+1 ,..., Aj k )) is a canon. The crucial observation now is 
that the above factorization u m Um -1 ■ ■ - u 0 of B 1 B 2 ■ ■ ■ Bn was defined in such a way 
that for every occurrence of a type-(3) nonterminal C(y) in m, the parameter y will 
be substituted by a tree from li during the derivation from Z to valf;(Z). Hence, we 
reorder the arguments in the right-hand sides of nonterminal occurrences in Ui in the 
correct way to obtain a canon. 

We now rename the nonterminals in the SLT grammars Gi (which are now of type 

(3) and type (4)) so that the nonterminal sets of G, Go,..., G m are pairwise disjoint. 
Let Xi(y) be the start nonterminal of Gi after the renaming. Then we add to the current 
SLT grammar G the union of all the Gi, and replace the production Z —»• B(A) by 
Z —»• X m X m _i ■ ■ -Xq{A). The construction implies that mAg'{Z) = canon(valc(Z)) 
for the resulting grammar G'. 

It remains to argue that the above construction can be carried out in polynomial 
time. All steps only need polynomial time in the size of the current SLT grammar. 
Hence, it suffices to show that the size of the SLT grammar is polynomially bounded. 
The algorithm is divided into |i¥-T| many phases, where in each phase it enforces 
val G'(Z) = canon(val( 3 (Z)) for a single nonterminal Z. Consider a single phase, 
where mAg'{Z) = canon(val( 5 (Z)) is enforced for a nonterminal Z. In this phase, 
we (i) change the production for Z and (ii) add new type-(3) and type-(4) productions 
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to G (the union of the Gi above). But the number of these new productions is polyno- 
mially bounded in the size of the initial SLT grammar (the one before the first phase), 
because the nonterminals introduced in earlier phases are not relevant for the current 
phase. This implies that the additive size increase in each phase is bounded polynomi- 
ally in the size of the initial grammar. □ 

Corollary 7. The problem of deciding whether val uo (Gi) and val uo (G 2 ) are isomor¬ 
phic for given SLT grammars G\ and G 2 is PTIME -complete. 

Proof Membership in PTIME follows immediately from LemmaQ] Lemma[5] and The¬ 
orem^ Moreover, PTlME-hardness already holds for dags, i.e., SLT grammars where 
all nonterminals have rank 0, as shown in [fl8l . □ 

4 Isomorphism of Unrooted Unordered SLT-Represented Trees 

An unrooted unordered tree t over E can be seen as a node-labeled (undirected) graph 
t — (V,E, A), where E C V x V is symmetric and A : V —> E. For a node v 
of t we define the eccentricity ecct(i>) = max u6 y 8t{u,v) and the diameter 0(t) = 
max„ g y ecct(v), where St(u, v) denotes the distance from u to v (i.e., the number of 
edges on the path from u to v in t). 

Let t G Tjj be a rooted ordered tree over E and let t' = uo(t) = (V,E, A) be the 
rooted unordered tree corresponding to t. The tree ur(f') = (V. E U E~ x , A) over E is 
the unrooted version of t!. An unrooted unordered tree t can be represented by an SLT 
grammar G by forgetting the order and root information present in G. Let val ur uo (G) = 
ur(uo(val(G))). 

In this section it is proved that isomorphism for unrooted unordered trees t\. t% 
represented by SLT grammars Gi, G 2 , respectively, can be solved in polynomial time 
with respect to |Gi| + | G 2 1 - We reduce the problem to the (rooted) unordered case that 
was solved in Corollary[7] 

Let t = (V, E, A) be an unordered unrooted tree. A node u of t is called center node 
oft if for all leaves v of t: 


St{u,v) < (0(s) + l)/2. 

Let center(f) be the set of all center nodes of t. One can compute the center nodes 
by deleting all leaves of the tree and iterating this step, until the current tree consists 
of at most two nodes. These are the center nodes of t. In particular, t has either one 
or two center nodes. Another characterization of center nodes that is important for our 
algorithm is via longest paths. Let p = (vo, V\,..., v n ) be a longest simple path in t, 
i.e., n = gft). Then the middle points vy n ^ and vy n -\ (which are identical if n is even) 
are the center nodes of t. These nodes are independent of the concrete longest path p. 

Note that there are two center nodes if and only if 0(t) is odd. Since our con¬ 
structions are simpler if a unique center node exists, we first make sure that 0 it) is 
even. Let ^ be a new symbol not in E. For an unrooted unordered tree t we denote 
by even(f) the tree where every pair of edge (u,v), (v,u) is replaced by the edges 
(u,v'), ( v',v ), (v,v'), ( v’,u ), where v' is a new node labelled Then for an SLT 
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grammar G = (N, E, P, S ) we let even(G) = (N, E U {#}, P', S ) be the SLT gram¬ 
mar where P' is obtained from P by replacing every subtree <r(ti ,..., tk) with a G E, 
k > 1, in a right-hand side by the subtree <r(#(ti),..., #(£&))■ Observe that 

- v al ur , uo (even(G)) = even(val ur , uo (G)), 

- 0(even(f)) = 2 • 0(t) is even, i.e., even(f) has only one center node, and 

- trees t and s are isomorphic if and only if even (t) and even(s) are isomorphic. 

Since even(G) can be constructed in polynomial time, we assume in the following that 
every SLT grammar produces a tree of even diameter and therefore has only one center 
node. For a tree t, of even diameter, we denote with center(t) its unique center node. 

Let u G V. We construct a rooted version root(f, u) of t, with root node u. We set 
root(f, u) = ( V, E A), where E' = {(u, v') G E S t (u, v) < S t (u , v')}. 

Two unrooted unordered trees t \, t -2 of even diameter are isomorphic if and only if 
root(fi, center(fi)) is isomorphic to root(f 2 , center(f 2 ))- Thus, we can solve in poly¬ 
nomial time the isomorphism problem for unrooted unordered trees represented by SLT 
grammars G, G' by 

(1) determining in polynomial time compressed representations u\ and U’> of u \ = 
center(val ur uo (G)) and U 2 = center(val ur uo (G')), respectively (Section |4~TT >, 

(2) constructing in polynomial time SLT grammars Gi,G 2 such that val uo (Gi) = 
root(val uriUO (G),ui) and val uo (G 2 ) = root(val uriL1 o(G'), U 2 ) (Section l4~2l >. and 

(3) testing in polynomial time if val uo (Gi) is isomorphic to val uo (G 2 ) (Corollary |7}. 

4.1 Finding Center Nodes 

Let G = (N, E. S. P) be an SLT grammar. A G-compressed path p is a string of 
pairs p = (Ai,u±) ■ ■ ■ (A n ,u n ) such that for all i G [n]. At G N, A\ = S, Ui G 
D[ti) is a Dewey address in i t where (A,; —> ti ) G P, U[ui\ = Ai + 1 for i < n, and 
ti [u n ] G E. If we omit the condition ti [u n ] G E , then p is a partial G-compressed path. 
Note that by definition, n < A r |. A partial G-compressed path uniquely represents one 
particular node in the derivation tree of G, and a G-compressed path represents a leaf 
of the derivation tree and hence a node of val(G). We denote this node by valc(p). The 
concatenation m,U 2 ,... ,u n of the Dewey addresses is denoted by u(p). 

For a context t(y) G Cs we define ecc(t) = ecct(y) (recall that in a context there 
is a unique occurence of the parameter y) and rty(f) = 8t(s, y) (the distance from the 
root to the parameter y). For a tree s G 7 we denote with h(s) its height. We extend 
these notions to contexts t G Cz:un and trees s G T^un by ecc(t) = ecc(valc;(f)), 
rty(f) = rty(val( 3 (f)), and h(s) = h(va\a(s)). 

Eccentricity, distance from root to y, and height can be computed in polynomial time 
for all nonterminals bottom-up. To do so, observe that for two contexts t(y),t'(y) G 
Csun and a tree s G T^un we have 

- rty(f[f / ]) = rty(f) + rty(f'), 

- ecc(f[f / ]) = max{ecc(f'), ecc(f) + rty(f')}, and 

- /i(f[s]) = max{ft(s), rty(f) + h(s)}. 
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Similarly, for a context t(y) = a(si,... Sj, y, Sj+i,..., Sk) and a tree s = a(si,..., Sk) 
we have: 

- rty(i) = 1, 

- ecc(t) = 2 + max{/i(si) | 1 < i < fc}, and 

- h(s) = 1 + max{/i(sj) | 1 < i < k}. 

Finally, note that for the tree t [s] (t(y) £ Cz, s £ Tz) we have 

0(t[s]) = ma x{0(t),0(s), ecc(f) + h(t)}. (1) 

Our search for the center node of an SLT-compressed tree is based on the following 
lemma. For a context t(y) £ Cz, where u is the Dewey address of the parameter y, and 
a tree s £ Tz we say that a node v of t[s] belongs to t if the Dewey address of v is in 
D(t) \ {it}. Otherwise, we say that v belongs to s, which means that u is a prefix of the 
Dewey address of v. 

Lemma 8. Let t{y) £ Cz be a context and s £ Tz a tree such that 0{t[s]) is even. Let 
c = center(t [s]). Then we have the following: 

- Ifecc(t) < h(s) then c belongs to s. 

- //ecc(t) > h(s) then c belongs to t. 

Proof. Let us first assume that ecc(f) < h(s). Then we have 0(t) < 2 ■ ecc(f) < 
ecc(f) + h(s), i.e„ 0(t[s]) = maxjtz^s), ecc(f) + h(s)} by (Q}- Together with ecc(t) < 
h(s) this implies that the middle point of a longest path in s[£] (which is c) belongs to 
the tree s. 

Next, assume that ecc(f) = h(s) + 1. Then we have 0{s) < 2-h(s ) < ecc(t) + h(s), 
i.e., 0(t[s]) = ma x{0(t), ecc(t) + h(s)}. Moreover, we claim that ecc(f) + h(s) > 
0(t). In case 0(t) = ecc (t), this is clear. Otherwise, 0(t) > ecc(t) and a longest path 
in t does not end in the parameter node y. It follows that 0{t) < 2 • (ecc(f) — 1) < 
ecc(£) + h(s). Thus, we have 0{t\s\) = ecc(t.) + h(s) = 2 • h(s) + 1, which is odd, a 
contradiction. Hence, this case cannot occur. 

Finally, assume that ecc(f) > h(s)+l. Again, we get 0(t[s]) = max{^t(£), ecc(£)+ 
h(s)}. Moreover, since ecc(£) > h(s) + 1 the center nodes c must belong to t. □ 

Lemma 9. For a given SLT grammar G such that val ur uo (G) has even diameter, one 
can construct a G-compressed path for center(val urLIO (G)). 

Proof. Consider the recursive Algorithm Q] It is started with £; = y, t r = p = e 
and A = S and computes the node center(val uriUO (G)). The following invariants are 
preserved by the algorithm: If center(£;, A, t r ,p) is called, then we have: 

- If A has rank 0 then t r = e 

- val(G) = val(£/[A[£ r ]]) (here we set t\e\ = t). 

- The tree ti [,4[f r ]] can be derived from the start variable S. 

-pis the partial G-compressed path to the distinguished A in f;[A[f r ]]. 

- center(val uriUO (G)) belongs to the subcontext val(A) in val(£j)[val(A)[val(£ r )]]. 
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Algorithm 1 Recursive procedure to find the G-compressed path for the center node 
procedure center^;, A,t r ,p ) 

if A —> B(C) (and thus t r = e) or A(y) —> B(C(y)) then 
if ecc (ti[B(y)]) < h(C[t r ]) then 

return center(f; [B(y)],C, t r ,p ■ (A, 1)) 
else 

return center(f;, B, C[t r \,p ■ ( A , e)) 
if A —» cr(Ai,..., Ak) (and thus t r = e) then 

U <- ti [<t(j4i, ..., Ai-i,y, A i+ 1 ,..., A k )] for all i £ [k] 
if there is an i £ [k] such that ecc (ti) < h(Ai) then 
return center (t,. Ai,e,p- ( A,i)) 
else 

return (p ■ (A, e)) 

if A{y) -)• er(Ai,... A s -i,y, A a+ i, ..., A k ) then 

ti ^ ti ]c (Ax j... j Ai~ i ifi <C s 

ti t ti ]c(A \,..., A s _i,t r , A s+ i,..., Ai-i,y, Ai + i ,..., A^)] ifs ^ i 
if there is an j £ [fc] \ {s} such that ecc (ti) < h(Ai) then 
return center(ti, Ai, e,p • ( Ai,i)) 
else 

return (p ■ (A, e)) 


For a call centerft/ ,A,t r ,p), the algorithm distinguishes on the right-hand side of A. If 
this right-hand side has the form A(B) or A(B(y )), then, by comparing ecc(ti[B(y)]) 
and h(C[t,]), we determine, whether the search for the center node has to continue in 
B or G, see Lemma[8] 

The case that the right-hand side of A has the form ct(Ai, ..., A k) is a bit more 
complicated. Let s; = val(f;) and Sj = val(Aj) (by the first invariant we know that 
t r = e). We have to find the center node of t := s;(<r(si,..., Sfc) and by the last 
invariant we know that it is contained in <r(si,..., s/.). We now consider all k many 
cuts of t along one of the edges between the cr-node and one of the Sj, i.e., we cut t into 
s;(cr(si,..., Si-i, y, s*+i, ■ ■ ■, Sfe) and Si. Using again Lemma[8] it suffices to compare 
ecc(s/(cr(si,..., Si-i, y, Sj+i, • • ■, Sfc))) and h(si) in order to determine whether the 
center node belongs to sj(ct(si, ..., Si-i,y, Si+i,..., Sk) or Si. If for some i, it turns 
out that the center node is in Si, then we continue the search with A,;. Finally, assume 
that for all i, it turns out that the center node is in s/(ct(si, ..., Si-i,y, s»+i,..., Sfc). 
Since by the last invariant, the center node is in a{si,... ,Sk), the er-labelled node must 
be the center node. The case of a production A(y) —> <j{A\, ... A s -i,y, A s+ 1 ,... ,Ak) 
can be dealt with similarly. 

Note that |t;| + \t r \ stays bounded by the size of G. Hence, whenever ecc (t) and 
h(t) have to be determined by the algorithm, then t is a polynomial size tree build from 
terminal and nonterminal symbols. By the previous remarks, ecc(t) and h(t) can be 
computed in polynomial time. □ 
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4.2 Re-Rooting of SLT Grammars 

Let G = (N, S, S, P) be an SLT grammar (as usual, having the normal form from 
Lemma|2]i and p a G-compressed path. Let s(p) £ 7V LJ jv be the tree defined inductively 
as follows: Let {A —> t) £ P and u £ D(t). Then s((A, it)) = t. If p = (A, t)p' with 
p' non-empty, then either (i) u = e and t = B(C) or (ii) it = i £ N and t[i\ £ N(°\ 
In case (i) we set s(p) = s(p')[G], in case (ii) we set s(p) = t'[s(p')], where t'(y) 
is obtained from t by replacing the i-th argument of the root by y. Note that s(p') £ 
Crutv({y}) if P 1 starts with a nonterminal of rank 1. Let s = s(p); its size is bounded 
by the size of G. Note that s[u(p)] is a terminal symbol (recall that u(p) denotes the 
concatenation of the Dewey addresses in p). Assume that s[n(p)] = er £ E. Let # be 
a fresh symbol and let s' be obtained from s by changing the label at u(p) from a to 
Let s' s" be the shortest derivation such that s"[e] = 6 £ E (it consists of 
at most \N\ derivation steps). We denote the ^-labeled node in s" by u. Finally, let t 
be obtained from s" by changing the unique A into a. We define the p-expansion of 
G, denoted exc(p), as the tuple (f, u, a, 6). Note that val, 3 (p) is the unique ^-labelled 
node in valcCs”)- Moreover, the p-expansion can be computed in polynomial time from 
G and p. 

The p-expansion (f, u. a, 6) has all information needed to construct a grammar G' 
representing the rooted version at p of val(G). If it = £ then also valf;(p) = e. Since G 
is already rooted at e nothing has to be done in this case and we return G' = G. If u ^ £ 
then val, 3 (p) ^ e and hence t contains two terminal nodes which uniquely represent 
the root node and the node val( 3 (p) of the tree val(G). 

Let si £ 7\j be a rooted ordered tree representing the unrooted unordered tree 
Si = ur(uo(si)). Let u ^ £ be a node of si. Let Si[e] = S £ £ and Si[u] = er £ S. 
A rooted ordered tree S 2 that represents the rooted unordered tree S 2 = root (Si, u ) can 
be defined as follows: Since u/e, we can write 

^(Cl) * • ■ ? Ci—1? f [^(£l) ■ * *) Cm)] > C«+l; ■ ■ * i Ck)i 

where t' is a context, and u = iu', where u' is the Dewey address of the parameter y in 
t'. We can define S 2 as 

s 2 = Cr(C 1 , • • • , Cm, rOOty(f')[«J(Cl, ■ • ■ , C*-l> Ci+l i • • • , Cfc)]), 

where rooty is a function mapping contexts to contexts defined recursively as follows, 
where / £ E, h,.. .,ti-i,U+h and t(y),t'(y) £ C s : 

rooty (y) = y (2) 

rooty(/(fi ,... ,U-i,y,ti+i, ■ ■ -,U)) = /(fi,.. •, *i-i, y, *i+i, • ■ ■ ,U) (3) 

rooty {t[t'(y)}) = rooty(f')[rooty(f(y))] (4) 

Intuitively, the mapping rooty unroots a context t(y) towards its vy-node it, i.e., it re¬ 
verses the path from the root to it. Thus, for instance, rooty (/(a, y, b )) = /(a, y , b) and 
rooty (/(a, g(c, y, d),b )) = g(c, /(a, y, b ), d). 

Lemma 10. From a given SLT grammar G and a G-compressed path p one can con¬ 
struct in polynomial time an SLT grammar G' such that val uo (G') is isomorphic to 
root(val uriUO (G), val G (p)). 
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Proof. Let G = (N, S, S , P) and ex G (p) = (t, u,a,S).lf u = £ then define G' = G. 
If u ^ e then we can write 

t = 6(B 1 ,..., [<t(£i, ..., £m)},B i+1 ,..., B k ), (5) 

where Bj £ N^°\ fj £ Tjv, t' is a context composed of nonterminals A £ N t{) and 
contexts /(Ci, ■ • •, (j+i, ■ ■ ■, Cl) (/ G B, Q £ T N ). and u = iu', where u' is 

the Dewey address of the parameter y in t'. 

We define G' = [N l±l N ', E, S, P') where N' = {A' \ A £ N^}. To define 
the production set P', we extend the definition of rooty to contexts from Cjjun by (i) 
allowing in the trees tj from Equation ([3} also nonterminals, and (ii) defining for every 
B £ N^\ rooty (B(y)) = B'(y). We now define the set of productions P' of P as 
follows: We put all productions from P except for the start production (S —> s) £ P 
into P'. For the start variable S we add to P' the production 

S ->• cr(^i, rooty(f , )[<5(Bi,...,B i _i,B i+ i,.. -,B k )]). 

Moreover, let A £ N ^ and (A(y) ->()£?. If this is a type-(3) production, then we 
add A'{y) ->• C to P'. If C = B(C{y)) then add A'{y) -t C'{B'{y)) to P'. 

Claim: Let A £ Then val g'(A') = rooty(val G (H)). 

The claim is easily shown by induction on the reverse hierarchical structure of G : Let 
{A t A ) £ P. If t A = f(Ai,...,Aj,y,A j+ i,...,Ai) then rooty(val G (H)) = 
val G (H). Since ( A' —>• t A ) £ P' and G' contains all productions of G except for the 
start production, we obtain val G /(H') = rooty(val G (H)). If t A = B(C{y)) then, by 
Equation (0), rooty(val G (l?(C(i/)))) = rooty(val G (C))[val G (f?)]. By induction the 
latter is equal to val G /(C')[val G / (B’)] which equals val(H/) by the definition of the 
right-hand side of A!. This proves the claim. 

The above claim implies that val G /(rooty(c(j/))) = rooty(val G (c(y))) for every con¬ 
text c(y) that is composed of contexts /(£i, • ■ ■, Cj-J ■ V, Cj+ 1 , ■ ■ •, (z) (Cj € T jv) and 
nonterminals A £ ATI 1 '. In particular, val G /(rooty(t')) = rooty(val G (f'(j/))) for the 
context t' from Equation (0. Hence, with Sj = val G /(^) = val G (^) and tj = 
val G /(f?j) = val g(Bj) we obtain 

val(G') = val G /(cr(^i, ... ,£ m , rooty (t') [5{B 1 ,..., B i _ 1 ,B i+ i,..., B k )])) 

= cr(si ,, s m , vaI G / (rooty(t)) [S(t-[,, L—i, L +1 , ■ ■ •, t k )]) 

= cr(si,..., s m , rooty(val G (f'))[<5(fi,... ,ti- i,L+i,... ,t k )])- 

Since val(G) = S(t 1: .. val G (f , )[o’(s 1 ,.. • ,s m )],f»+i, ■ • ■ ,t k ), it follows that 

val uo (G') is isomorphic to root(val urjUO (G), val G (p)). □ 

Corollary 11. The problem of deciding whether val uruo (Gi) and val uruo (G 2 ) are iso¬ 
morphic for given SLT grammars G \ and G 2 is PTIME-complete. 

Proof The upper bound follows from Lemma[9] LemmaflOl and Corollary [7] Hardness 
for PTIME follows from the PTlME-hardness for dags ED and the fact that isomorphism 
of rooted unordered trees can be reduced to isomorphism of unrooted unordered trees 
by labelling the roots with a fresh symbol. □ 
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5 Bisimulation on SLT-compressed trees 

Fix a set £ of node labels. Let G = (V, E, A) be a directed node-labelled graph, i.e., 
E C V x V is the edge relation and X : V —> £ is the labelling function. A binary 
relation R C V x V is a bisimulation on G , if for all (it, v) £ R the following three 
conditions hold: 

- A(it) = X(v) 

- If (it, u') £ E, then there exists v' £ V such that (v, v') £ E and (it', v') £ R. 

- If (v, v') £ E, then there exists u' £ V such that (it, it') £ E and (it', v') £ R. 

Let the relation ~ be the union of all bisimulations on G. It is itself a bisimulation (and 
hence the largest bisimulation) and an equivalence relation. Two rooted unordered trees 
s,t with node labels from £ and roots r s , r t are bisimulation equivalent if r s ~ r f 
holds in the disjoint union of s and t. For instance, the trees /(a, a, a) and /(a, a) are 
bisimulation equivalent but the trees f(g(a),g(b)) and f(g(a, b )) are not. 

For a rooted unordered tree t we define the bisimulation canon bcanon(i) induc¬ 
tively as follows: Let t = f(ti, ..., t n ) ( n > 0) and let bi = bcanon(L). Let si,..., s m 
be a list of trees such that (i) for every i £ [m], Si is isomorphic to one of the bj, 
and (ii) for every i £ [n\ there is a unique j £ [m] such that Sj and bj are isomor¬ 
phic as rooted unordered trees. Then bcanon(f) = f(s 1 ,..., s m ). In other words: 
Bottom-up, we eliminate repeated subtrees among the children of a node. For instance, 
bcanon(/(a, a, a)) == /(a) = bcanon(/(a, a)). The following lemma can be shown by 
a straightforward induction on the height of trees. 

Lemma 12. Let s and t be rooted unordered trees. Then s and t are bisimulation equiv¬ 
alent if and only bcanon(s) and bcanon(f) are isomorphic. 

The proof of the following theorem is similar to those of Theorem[6] 

Theorem 13. From a given SLT grammar G one can compute a new SLT grammar G' 
such that val uo (G") is isomorphic to bcanon(val uo (G)). 

Proof. Let G = (N, £. S. R). We will add polynomially many new nonterminals to G 
and change the productions for nonterminals from iV- 0) such that for the resulting SLT 
grammar G' we have uo(valQ/(i?)) = bcanon(uo(valG(^))) for every Z £ N(°\ 
Consider a nonterminal Z £ N^ (r> and let M be the set of all nonterminals in 
G that can be reached from Z. By induction, we can assume that G already satisfies 
uo(valc(A)) = bcanon(uo(val( 5 (A))) for every A £ M^ \ {Z}. Moreover, we can 
assume that G contains no distinct nonterminals A \, A 2 £ N such that uo(val( 5 (Ai)) 
and uo(valc(^l 2 )) are isomorphic. This is justified because by Corollary[7]we can test in 
polynomial time whether uo(valc(Ai)) and uo(valc;(Al 2 )) are isomorphic and replace 
A 2 by A\ in G in such a case (the tree produced by the new grammar is isomorphic to 
uo(val(G))). Similarly, if there is a type-(l) production A —> cr{A\,..., Ajf) such that 
Ai = Aj for i < j, then we remove Aj from the parameter list, and the same is done for 
type-(3) productions. These preprocessing steps do not change the bisimulation canon. 
We now distinguish two cases. 
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Case (i). Z is of type (1), i.e., has a production Z —>• a(Ai, ... ,Ak). By the above 
preprocessing, we already have uo(val g(Z)) = bcanon(uo(vale(,2r))), so nothing has 
to be done. 

Case (ii). Z is of type (2), i.e., has a production Z —> B(A). For C £ M^ let nc = 
|valG(C)| and let 


J = {n c | C £ M(°) \ {Z}}. 

We can compute this set of numbers easily in a bottom-up fashion. 

Consider the maximal G(4)-derivation starting from B(A), i.e., 

B(A) =s£ (4) B X B 2 ---B N {A), 

where Bi is a typ-(3) nonterminal. Let tk = va\c(BkBk+i ■ ■ ■ B_n(A)) for k £ [TV] and 
fjv+i = val g(A). For a given position we can compute in polyomial time the size |L 
by first computing an SLT grammar for f* and then computing the size of the generated 
tree bottom-up. Clearly, the sequence |£i|, |t 2 1, • • •, |fjv+i| is monotonically decreasing. 
This allows to compute, using binary search, the set of positions 

/ = {*|*G[JV+l],|ti|€7}. 

Note that |/| < \M^ \ {Z} \ and N +1 £ I. Next, we check in polynomial time, using 
Corollary[7] for every position i £ /, whether uo(fj) is isomorphic to uo(valc(G)) for 
some C £ AT' 0 - 1 \ {Z}. If such a j exists then we keep i in the set I, otherwise we 
remove i from I. After this step, / contains exactly those positions i £ I such that 
uo(fj) is isomorphic to uo(valG(C)) for some C £ \ {Z}. 

Assume that I = ik} with 1 < i 4 < < • • • < ik-i < ik = N + 1. We 

now factorize the string BiB 2 ■ ■ ■ Bn as 

BiB 2 ■ ■ ■ B n = uiB il _iu 2 B i2 _i ■ ■ ■ UkB ik _ i, 

where Uj = B l:il ■ ■ ■ Bi _ 2 for j £ [fc] (set i o = 1). By Lemma [3] we can compute 
in polynomial time an SLP Gj for the string Uj. Moreover, we can compute the non¬ 
terminals Bj ^_i in polynomial time. For the further consideration, we view Gj as a 
1-SLT grammar consisting only of type-(4) productions. Note that val(G ? ) is a linear 
tree, where every node is labelled with a type-(3) nonterminal. 

We now rename the nonterminals in the SLT grammars Gj so that the nonterminal 
sets of G, Gi,..., Gk are pairwise disjoint. Let X 3 (y) be the start nonterminal of Gj 
after the renaming. Then we add to the current SLT grammar G the union of all the 
Gj. Moreover, for every j £ [fc] we add a new nonterminal Cj to G, whose right-hand 
side is derived from the right-hand side of £?* ._i as follows: Let the right-hand side 
for Bij-i be er(Ai,... ,Ai,y) (we can assume that the parameter occurs at the last 
argument position, since this is not relevant for the bisimulation canon). We now check 
whether there exists an A,; (i £ [/]) such that uo(valc(Ai)) is isomorphic to uo(ij.). If 
such an i exists then by our preprocessing it is unique, and we add to G the production 
Cj(y ) —> cr{A \,..., Aj_i, ..., Ai, y ). If such an i does not exist, then the new 


16 


nonterminal Cj is not needed. In order to keep the notation uniform, let Cj = II,, _ i. 
Finally, we redefine the production for Z to 


Z^X 1 C 1 X 2 C 2 ---X k C k (A). 


This concludes the construction of the SLT grammar G' . As in the proof of Theorem|6] 
one can argue that the size of G' is polynomially bounded in the size of G. □ 

From Corollary[7J Lemma[T2l and Theorem[13]we get: 

Theorem 14. For given SLT grammars G i and G 2 one can check in polynomial time, 
whether val uo (Gi) and val uo (G 2 ) are bisimulation equivalent. 


6 Unordered Isomorphism of Non-Linear ST Grammars 

In this section, we consider ST grammars that are not necessarily linear. 

Theorem 15. The question, whether val uo (Gi) and val uo (G 2 ) are isomorphic for two 
given ST grammars G\ and G 2 is PSPACE -hard and in EXPTIME. 

Proof. The upper bound follows from LemmaQ]and Corollary|7] For the lower bound, 
we use a reduction from QBF. Recall that the input for QBF is a quantified boolean 
formula of the form 


& = Q 1 Z 1 Q 2 Z 2 ■ ■ ■ QnZ n ■ y{z\,..., z n ), (6) 

where Qi £ {V, 3}, the Zi are boolean variables, and <p(zi ,..., z n ) is a quantifier-free 
boolean formula. We can assume that in ip, negations only occur in front of variables. 
We use a reduction from the evaluation problem for boolean expressions to the iso¬ 
morphism problem for explicitly given rooted unordered trees from HQ. Let us take 
trees si,s 2 ,ti,t 2 . Consider the two trees s and t in Figure [T] that are built up from 
si,s 2 ,ti,h- Clearly, s = t (s and t are isomorphic) if and only if si = t± and s 2 = t 2 . 
Similarly, for the trees s and t from Figure [2] we have s = t if and only if si = t\ or 
s 2 = t 2 . 

Fix the ranked alphabet Z = {/, a, b, 0,1}. We will construct a non-linear ST gram¬ 
mar G (without start variable), which contains for every subformula ... ,v m ) 
(where {tq,..., v m } C {zi,..., z n } is the set of free variables of ijj) of the for¬ 
mula L' from ([6|, two nonterminals A^{v \,..., v m ) and B^{v 1 ,..., v m ) such that 
for all truth values ci,..., c m £ {0,1}: i[>(ci,..., c m ) evaluates to 1 if and only if 
val G(A^)[vi <— Ci | * £ [to]] and val c{By,)[vi Ci \ i £ [to]] are isomorphic as 
rooted unordered trees. 

The base case is that of a literal z or -<z. We introduce the following productions: 

A z(z) -> f(z,l), B z (z) -t f(l,z), A^ z (z)-t f(z, 0), B^ z (z) f(0,z). 

Now leti/>(vi,..., Vm) = ..., Xk)l\i/j 2 {yi, ■ ■ ■ ■ Vi) be a subformula of the quan¬ 

tifier-free part ip{z\,...,z n ) in ©, where {v\= {aq,..., x k , y u ..., yij. 
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Then we use the and-gadget from Figure[T|and set (where v = (iq,..., v m ) and simi¬ 
larly for x and y) 

A^(v) -»■ /(a(4/0z))0(A/, 2 (?/))) and 
Bi\v) -»■ f{a(B^(x)),b{B^ 2 (y))). 

If -0(^1,..., v m ) = ipi(xi ,..., x k ) V 02( 2 / 1 , • • •, Vi) then we use the or-gadget from 
Figure [2] and set 

A^v) ->• f(f(a(A^ 1 (x)),b(B^ 2 (y))),f(a(B i , 1 (x)),b(A i)2 (y)))) and 
B^{v) ->■ f{f{a(A^ 1 (x)),b{A^ 2 {y)))J{a{B^ x {x)),b{B, l , 2 {y)))). 

For a quantified subformula i/>(z 1 ,..., Zi- 1 ) = Vzj ip'{z \,..., Zj_i, 2 ,;), we can define 
the productions similarly (let z = (zi,..., Zi- 1 )): 

A/,(z) ^ f( a {Aii>' (z, 0)), b(Aj/ji(z, 1))) and 
B^(z) ->• /(a(B^-(z,0)), 6 (^( 3 , 1 ))). 

Finally, for ip(zi,Zi- 1 ) = t//(zi,..., Zj_i, Z*) we set 

A*(a) ->■ /(/(a(^'(^O))0( 5 y-'(^ 1 ))):/(a( 5 V’ , (^°))^(^ , ( 5 > 1 )))) and 

f{f{a{A^>(z,0)),b(Ai,>(z, l)))J(a(B r (z,0)),b(B r (z,l)))). 

This concludes the construction of the ST grammar G. Let G = (N, G, P). Then we 
define the two ST grammars G\ = (N, G, Ay, P) and G 2 = ( N, G, By, P). We have 
val uo (Gi) = val uo (G 2 ) if and only if the formula P is true. □ 

The complexity bounds from Theorem Q3] also hold if we want to check whether the 
unrooted unordered trees val ur uo (Gi) and val ur LIO (G 2 ) are isomorphic: Membership 
in EXPTIME follows from Lemma Q] and Corollary [TT] For PSPACE-hardness, one can 
take the reduction from the proof of Theorem [15] and label the roots of the final trees 
with a fresh symbol. Finally, the above PSPACE-hardness proof can be also used for the 
bisimulation equivalence problem for trees given by ST grammars (the gadgets from 
Figure [T]and [2] can be reused). Hence, bisimulation equivalence for trees given by ST 
grammars is PSPACE-hard and in EXPTIME. Since an ST grammar can be transformed 
into a hierarchical graph definition for a dag (see the proof of Lemma Q}. we redis¬ 
cover the following result from 0: Bisimulation equivalence for dags that are given by 
hierarchical graph definitions is PSPACE-hard and in EXPTIME. 

7 Open problems 

The obvious remaining open problem is the precise complexity of the isomorphism 
problem for unordered trees that are given by ST grammars. Theorem [T5] leaves a gap 
from PSPACE to EXPTIME. Another interesting open problem is the isomorphism prob¬ 
lem for graphs that are given by hierarchical graph definitions. To the knowledge of the 
authors, this problem has not been studied so far. 
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